Level of measurement or scale of measure is a classification that describes the nature of information within the values assigned to variables. Psychologist Stanley Smith Stevens developed the best-known classification with four levels, or scales, of measurement: nominal, ordinal, interval, and ratio. This framework of distinguishing levels of measurement originated in psychology and has since had a complex history, being adopted and extended in some disciplines and by some scholars, and criticized or rejected by others. Other classifications include those by Mosteller and John Tukey, and by Chrisman.
Nominal | Classification, membership | =, ≠ | aggregate data | Mode | Qualitative variation |
Ordinal | Comparison, level | >, < | Sorting | Median | Range, interquartile range |
Interval | Difference, affinity | +, − | Comparison to a standard | Arithmetic mean | Deviation |
Ratio | Magnitude, amount | ×, / | Ratio | Geometric mean, harmonic mean | Coefficient of variation, studentized range |
Nominal measurement may differentiate between items or subjects based only on their names or (meta-)categories and other qualitative classifications they belong to. Thus it has been argued that even dichotomy data relies on a constructivist epistemology. In this case, discovery of an exception to a classification can be viewed as progress.
Numbers may be used to represent the variables but the numbers do not have numerical value or relationship: for example, a globally unique identifier.
Examples of these classifications include gender, nationality, ethnicity, language, genre, style, biological species, and form.Nominal measures are based on sets and depend on categories, a la Aristotle: "Invariably one came up against fundamental physical limits to the accuracy of measurement. ... The art of physical measurement seemed to be a matter of compromise, of choosing between reciprocally related uncertainties. ... Multiplying together the conjugate pairs of uncertainty limits mentioned, however, I found that they formed invariant products of not one but two distinct kinds. ... The first group of limits were calculable a priori from a specification of the instrument. The second group could be calculated only a posteriori from a specification of what was done with the instrument. ... In the first case each unit of would add one additional dimension (conceptual category), whereas in the second each unit would add one additional atomic fact.", – pp. 1–4: MacKay, Donald M. (1969), Information, Mechanism, and Meaning, Cambridge, MA: MIT Press, In a university one could also use residence hall or department affiliation as examples. Other concrete examples are
Nominal scales were often called qualitative scales, and measurements made on qualitative scales were called qualitative data. However, the rise of qualitative research has made this usage confusing. If numbers are assigned as labels in nominal measurement, they have no specific numerical value or meaning. No form of arithmetic computation (+, −, ×, etc.) may be performed on nominal measures. The nominal level is the lowest measurement level used from a statistical point of view.
The ordinal scale places events in order, but there is no attempt to make the intervals of the scale equal in terms of some rule. Rank orders represent ordinal scales and are frequently used in research relating to qualitative phenomena. A student's rank in his graduation class involves the use of an ordinal scale. One has to be very careful in making a statement about scores based on ordinal scales. For instance, if Devi's position in his class is 10th and Ganga's position is 40th, it cannot be said that Devi's position is four times as good as that of Ganga. Ordinal scales only permit the ranking of items from highest to lowest. Ordinal measures have no absolute values, and the real differences between adjacent ranks may not be equal. All that can be said is that one person is higher or lower on the scale than another, but more precise comparisons cannot be made. Thus, the use of an ordinal scale implies a statement of "greater than" or "less than" (an equality statement is also acceptable) without our being able to state how much greater or less. The real difference between ranks 1 and 2, for instance, may be more or less than the difference between ranks 5 and 6. Since the numbers of this scale have only a rank meaning, the appropriate measure of central tendency is the median. A percentile or quartile measure is used for measuring dispersion. Correlations are restricted to various rank order methods. Measures of statistical significance are restricted to the non-parametric methods (R. M. Kothari, 2004).
In 1946, Stevens observed that psychological measurement, such as measurement of opinions, usually operates on ordinal scales; thus means and standard deviations have no validity, but they can be used to get ideas for how to improve operationalization of variables used in . Most psychological data collected by psychometric instruments and tests, measuring cognitive and other abilities, are ordinal, although some theoreticians have argued they can be treated as interval or ratio scales. However, there is little prima facie evidence to suggest that such attributes are anything more than ordinal (Cliff, 1996; Cliff & Keats, 2003; Michell, 2008).*
In particular, IQ scores reflect an ordinal scale, in which all scores are meaningful for comparison only. There is no absolute zero, and a 10-point difference may carry different meanings at different points of the scale.
The ratio type takes its name from the fact that measurement is the estimation of the ratio between a magnitude of a continuous quantity and a unit of measurement of the same kind (Michell, 1997, 1999). Most measurement in the physical sciences and engineering is done on ratio scales. Examples include mass, length, Time, plane angle, energy and electric charge. In contrast to interval scales, ratios can be compared using division. Very informally, many ratio scales can be described as specifying "how much" of something (i.e. an amount or magnitude). Ratio scales are often used to express an order of magnitude such as for temperature in Orders of magnitude (temperature).
On the other hand, Stevens (1975) said of his own definition of measurement that "the assignment can be any consistent rule. The only rule not allowed would be random assignment, for randomness amounts in effect to a nonrule". Hand says, "Basic psychology texts often begin with Stevens's framework and the ideas are ubiquitous. Indeed, the essential soundness of his hierarchy has been established for representational measurement by mathematicians, determining the invariance properties of mappings from empirical systems to real number continua. Certainly the ideas have been revised, extended, and elaborated, but the remarkable thing is his insight given the relatively limited formal apparatus available to him and how many decades have passed since he coined them."
The use of the mean as a measure of the central tendency for the ordinal type is still debatable among those who accept Stevens's typology. Many behavioural scientists use the mean for ordinal data anyway. This is often justified on the basis that the ordinal type in behavioural science is in fact somewhere between the true ordinal and interval types; although the interval difference between two ordinal ranks is not constant, it is often of the same order of magnitude.
For example, applications of measurement models in educational contexts often indicate that total scores have a fairly linear relationship with measurements across the range of an assessment. Thus, some argue that so long as the unknown interval difference between ordinal scale ranks is not too variable, interval scale statistics such as means can meaningfully be used on ordinal scale variables. Statistical analysis software such as SPSS requires the user to select the appropriate measurement class for each variable. This ensures that subsequent user errors cannot inadvertently perform meaningless analyses (for example correlation analysis with a variable on a nominal level).
L. L. Thurstone made progress toward developing a justification for obtaining the interval type, based on the law of comparative judgment. A common application of the law is the analytic hierarchy process. Further progress was made by Georg Rasch (1960), who developed the probabilistic Rasch model that provides a theoretical basis and justification for obtaining interval-level measurements from counts of observations such as total scores on assessments.
For example, percentages (a variation on fractions in the Mosteller–Tukey framework) do not fit well into Stevens's framework: No transformation is fully admissible.
While some claim that the extended levels of measurement are rarely used outside of academic geography, graded membership is central to fuzzy set theory, while absolute measurements include probabilities and the plausibility and ignorance in Dempster–Shafer theory. Cyclical ratio measurements include angles and times. Counts appear to be ratio measurements, but the scale is not arbitrary and fractional counts are commonly meaningless. Log-interval measurements are commonly displayed in stock market graphics. All these types of measurements are commonly used outside academic geography, and do not fit well to Stevens's original work.
That is, if Stevens's sone scale genuinely measured the intensity of auditory sensations, then evidence for such sensations as being quantitative attributes needed to be produced. The evidence needed was the presence of additive structure—a concept comprehensively treated by the German mathematician Otto Hölder (Hölder, 1901). Given that the physicist and measurement theorist Norman Robert Campbell dominated the Ferguson committee's deliberations, the committee concluded that measurement in the social sciences was impossible due to the lack of concatenation operations. This conclusion was later rendered false by the discovery of the theory of conjoint measurement by Debreu (1960) and independently by Luce & Tukey (1964). However, Stevens's reaction was not to conduct experiments to test for the presence of additive structure in sensations, but instead to render the conclusions of the Ferguson committee null and void by proposing a new theory of measurement:
Stevens was greatly influenced by the ideas of another Harvard academic,Percy Bridgman (1957) The Logic of Modern Physics the Nobel Prize physicist Percy Bridgman (1927), whose doctrine of operationalism Stevens used to define measurement. In Stevens's definition, for example, it is the use of a tape measure that defines length (the object of measurement) as being measurable (and so by implication quantitative). Critics of operationalism object that it confuses the relations between two objects or events for properties of one of those of objects or events (Moyer, 1981a, b; Rogers, 1989). Michell, J. (1999). Measurement in Psychology – A critical history of a methodological concept. Cambridge: Cambridge University Press.
The Canadian measurement theorist William Rozeboom was an early and trenchant critic of Stevens's theory of scale types.
|
|